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We present a method to measure dominant Standard Model (SM) backgrounds using data con- 
taining high rapidity objects in pp collisions at the Large Hadron Collider (LHC). The method 
is developed for analyses of early LHC data when robustness against imperfections of background 
modeling and detector simulation can be a key to the discovery of new physics at LHC. 
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I. INTRODUCTION 

The LHC will soon start operating in an unexplored 
energy regime at y/s ~ 14 TeV, about seven times higher 
than that achieved at the Tevatron. At that center-of- 
mass energy, a large number of new particles could be 
produced even in a data sample of modest integrated 
luminosity. The challenge is to distinguish events with 
new particles from those, many orders of magnitude more 
copious, attributed to the SM, and to do so using tools 
and methods appropriate for early data. The challenge 
is magnified by the fact that signatures of the physics 
beyond the SM realized in nature are not known. 

Heavy new particles are produced, approximately at 
threshold, via interactions of energetic partons. Their de- 
cay products tend to be distributed uniformly over solid 
angle, which corresponds to a narrow central rapidity 
region SM particles are light on the mass scale of 
14 TeV and tend to be produced in interactions of soft, 
often very asymmetric in energy, partons. They receive a 
significant boost along the beam line, which makes them 
distributed over a wide rapidity range. 

In this paper, we present a new method to measure 
dominant SM backgrounds in searches for heavy new par- 
ticles. It uses data containing high rapidity objects to 
predict SM yields at small rapidity. We apply this to 
the SM processes: Z+jets, W+jets, 7+jets, QCD jets 
and ti, that are the largest background sources in many 
new physics searches. We also discuss the usage of a ra- 
tio constructed from event yields in central and forward 
rapidity regions as a generic search variable. 

The method is presented in the context of a new 
physics search involving leptons, photons, jets and miss- 
ing transverse energy. In the absence of a single most 
compelling model of new physics, the search is developed 
in a model independent way. The only assumption we 
make is that new particles are heavy and they decay to 
SM particles via a multi-stage cascade producing a large 
number of jets, so that the number of jets is a main search 
variable. A key feature of our method is that system- 
atic uncertainties associated with incomplete knowledge 
of the SM production rates and detector artifacts cancel 
to first order. The emphasis throughout is on robustness 
against imperfections of background modeling required 
for new physics searches in early LHC data. 



II. METHOD OVERVIEW 

We consider final states involving many jets, 4 or more. 
The SM V+jets production rates, where for brevity V 
stands for a Z, W, 7 or a jet 0], fall steeply as the num- 
ber of jets grows, but they are difficult to predict from 
first principles. Monte Carlo (MC) techniques are unre- 
liable in predicting backgrounds with a large number of 
jets. Theory calculations 0] do not exist at sufficiently 
high order. The structure functions have significant un- 
certainties for partons carrying a small fraction, x, of the 
proton momentum that is relevant for LHC \A\. Large 
uncertainties in the calibration of the experimental ap- 
paratus are expected in early data taking. For these rea- 
sons, instead of relying on MC simulation of the detec- 
tor response to SM processes, we use control regions in 
data to determine dominant SM backgrounds. We iden- 
tify control samples in kinematic regimes where the SM 
dominates and extrapolate backgrounds measured there 
into the signal region where new physics may contribute. 
In V+jets, the SM dominates when the transverse mo- 
mentum, \pr\, of V or the number of jets, Nj, is small. 
These control regions have been used previously for data- 
based background determination We use, in addition, 
control samples with high rapidity objects that are back- 
ground dominated even when |pVj or Nj is large. Jet 
rapidity has been successfully used previously in di-jet 
resonance searches at the Tevatron [g. 

Figure [T] shows the (pseudo-) rapidity distributions for 
2+jets (a), W+jets (b), 7+jets (c), and multi-jets (d). 
In the -Z+jets channel, we use the rapidity of the Z bo- 
son, yz, as a key discriminating rapidity variable. The 
W boson rapidity cannot be unambiguously determined 
due to the undetected neutrino. We instead use the lep- 
ton pseudo-rapidity |l|, 77i op ton, for W+jets. The pseudo- 
rapidities of the photon, rj^, and the highest \px\ jet, r)* et , 
are used for 7+jets and multi-jets, respectively. As seen 
in Figure [TJ the (pseudo-) rapidity distributions for de- 
cays of new massive particles are central, while that for 
the SM processes are approximately uniform in a wide 
rapidity range. Furthermore, the rapidity distributions 
vary slowly as the number of jets increases. 

The object providing the discriminating rapidity vari- 
able is called a tag 0]. We use events with forward tags 
to determine backgrounds for events with central tags, 
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FIG. 1: Rapidity of Z-bosons from SM Z+jets (a), pseudo-rapidity of charged leptons from SM W+jets (b), pseudo-rapidity of 
7 for SM 7+jets (c), and pseudo-rapidity of the highest \pr \ jet in SM QCD multi-jet events (d). Generator level requirements 
of |r? 7 | < 3.0 and Jfict < 4.0 are imposed in plots (c) and (d). Shapes of rapidity distributions from LM4 and LM6 mSUGRA 
benchmark points QJI are shown by black hatched histograms in the Z+jets and W^+jets cases, respectively. 



using an algorithm described in section ITVl 

In this paper, for brevity, we discuss searches at high 
Nj, since Nj is a particularly simple and robust variable. 
Other distributions considered in our search include: the 
highest jet \pt\ (|Pr° ad |) an d the Jt = Wt \ spectra 
in each Nj bin; and Nj distributions, which are closely 
related to Nj but obtained as a sum of weights of ei- 
ther |p^ ead | or J T in each Nj bin. The N} distribu- 
tions have higher discriminating power compared to the 
Nj distributions since new particles are expected to be 
heavy. However, reliance on the |p^ cad | or Jt spectra is 
more susceptible to uncertainties in the jet energy scale. 



III. EXPERIMENTAL ASPECTS 

The ATLAS and CMS experiments use multi-purpose 
detectors that are in the final stages of construction at the 
European Organization for Nuclear Research (CERN). 
Detailed descriptions of the detectors can be found in 



Ref. [8[. Of primary importance for our studies are 
the detectors' rapidity coverages and kinematic thresh- 
olds. The detectors are capable of efficiently reconstruct- 
ing electrons and muons with low fake rates for lepton 
\pr\ > 20 GeV within |?y| < 2.5. Photons and jets are 
reconstructed in the \r]\ < 2.5 and \r]\ < 3.0 range, re- 
spectively. Missing transverse energy, _E™ 1SS , is calcu- 
lated using Et measurements of all reconstructed ob- 
jects in each event. Mis-measured or mis-reconstructed 
objects, calorimeter noise, malfunctioning detector sub- 
systems and channels, and background unrelated to pp 
collisions constitute sources of unphysical i?™ lss that may 
complicate the usage of E T 11SS in early searches. Accord- 
ingly, we perform studies with and without a requirement 
on _E™ 1SS in the event selection. 

To study the effectiveness of the method, we have 
produced mock data samples for the following SM pro- 
cesses: Z+jets (5.0 fb _1 , up to 5 partons, Z — > 
W+jets (1.0 fb _1 , up to 5 partons, W -> lv{), 
ti (1.0 fb _1 , up to 4 partons, tt — > Ivibbjj and ti — > 
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lviTv T bb), 7+jets (400.0 pb -1 , up to 5 partons) and 
QCD jets (1.0 pb _1 , up to 5 partons), where I is or 
e. The integrated luminosity indicated in parentheses 
for each channel specifies the sample size used in our 
studies, except where specified otherwise. These sam- 
ples were generated with ALPGEN [§], and PYTHIA 
was used for parton showering, hadronization, simula- 
tion of the underlying event and jet reconstruction. To 
model features of a new physics signal in search distri- 
butions, we produced mock signal data samples for Min- 
imal Supergravity (mSUGRA) benchmark points LM4 
and LM6 using PYTHIA. 

Kinematic selection criteria are applied as follows. 
Electrons and muons are required to have \pr | of at least 
20 GeV in the \r]\ < 2.5 range. Photons are reconstructed 
above the \px\ threshold of 30 GeV in the |?7| < 2.5 range. 
Jets are reconstructed using the PYCELL algorithm [10| 
and required to be within \rj\ < 3.0 for \px\ thresholds 
varying between 30 and 100 GeV. Low thresholds are 
used for background studies, while higher thresholds are 
used to study signal dominated regions. 

Detector response is not directly simulated, although 
an assumed reconstruction efficiency of 50% is ap- 
plied in each channel. The E™ lss vector is approx- 
imated by a vector opposite to the sum of pV mea- 
surements of charged leptons, photons, and jets. Us- 
ing the 7+jets sample, we find that the jet energy res- 
olution function in our mock data samples is approxi- 
mately Gaussian with a varying from about 15% at 30 
GeV to about 8% at 100 GeV. To simulate effects of 
£"™ ISS mis-modeling due to jet energy fluctuations with 
non-Gaussian tails and incomplete hermeticity of the de- 
tectors, we perform robustness tests where jet energies 
are varied according to the hypothetical probability den- 
sity function shown in Figure [2l and jets are removed in 
selected regions, as described in section IVT1 
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FIG. 2: A hypothetical probability density function used for 
jet energies in modeling the effect of artificial _B™ 1SS . 

These selection criteria and sample sizes are chosen 
generally and are not optimized to any new physics 



model. The new physics reference models listed above 
are used only for illustration. Our goal in this paper is to 
demonstrate the scope of the method and its performance 
rather than to attain high sensitivity to a specific model 
for a specific final state or quantify that sensitivity. 

IV. ALGORITHM 

To describe and illustrate the algorithm and tests of its 
robustness, in the next several sections we center the dis- 
cussion on the Z+jets channel. The discussion applies to 
all four F+jets channels, however, and differences among 
these channels are pointed out where significant. 

The rapidity range for reconstructed Z bosons passing 
realistic event selection criteria is reduced (Figure [3]). We 
define forward events as those with a Z boson having 
\yz\ > 1-3, and we call the detector region with |?7| > 1.3 
the forward region. Central events are defined as those 
with a Z boson at \yz\ < 1-0, and the central region of 
the detector as that having \r/\ < 1.0. (This definition of 
central and forward categories is arbitrary and could be 
modified without significant effect.) 

Small Nj bins are SM dominated for both central and 
forward events, and we use them to predict the SM con- 
tribution to the central, high Nj bins where signal would 
appear. This is done by measuring a ratio, denoted 
as Rn.j, of the central yield (Y^° ntral ) to the sum of 
forward (yF,° rward ) an d central yields in each Nj bin: 

R Nj = y Centrally Forward + y Central) _ A ^ fit to 

Rnj is made in the low Nj bins and extrapolated into 
the high Nj region. The extrapolated ratios and the 
yields of forward events in high Nj bins are combined to 
obtain a background prediction in the central, high Nj 
signal region. 

The accuracy of this background prediction can be 
tested in mock data samples by comparing it to the yield 
in the central region at high Nj. This estimated-to- 
observed comparison is shown as a function of Nj in 
Figure 2] for Z+jets, IF+jets, 7+jets and pure QCD 
jets. The prediction is made using fits in 1 < Nj < 3 
for Z+jets and VF+jets. For 7+jets and multi-jets, 
2 < Nj < 4 is used. The observed central yield at high 
Nj is well matched to the prediction in all cases. Pull 
distributions, defined as (A bscrvcd - AWimatedVostat, 
where iVobserved is the observed number of central events, 
AEstimated is the number of central events estimated us- 
ing the algorithm and ustat is the total statistical un- 
certainty, are in the bottom plot of the same Figure in 
black markers of the appropriate shape for each channel. 
Shaded markers in the bottom plot show how the pulls 
change with the addition of a 1% relative systematic un- 
certainty in each Nj bin. With at most a small system- 
atic uncertainty, the algorithm estimates the background 
in the central region accurately. 

The results in Figure 0] are obtained with a jet thresh- 
old of 30 GeV. A higher threshold would likely improve 
signal sensitivity, but it could also affect the algorithm's 
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FIG. 3: Rapidity of Z-bosons from the SM Z+jets produc- 
tion in the fiducial coverage of LHC detectors. The central 
signal region is indicated by solid arrows. The background 
dominated region is at \yz\ larger than that indicated by 
dashed arrows. The Z rapidity shape from a LM4 mSUGRA 
benchmark is shown by the black hatched histogram. 



performance. As the jet threshold changes, the R^j val- 
ues may change, but the low Nj fit should properly ac- 
count for any difference. We search for the presence of bi- 
ases by varying the jet threshold between 30 and 100 GeV 
and repeating the tests in Figured (e) for the Z+jets and 
W^+jets channels. No evidence of a bias is found. 

The performance of the algorithm when signal is 
present is illustrated in Figure where we compare the 
central yields and the predictions with and without a 
signal contribution. A clear excess of signal above the 
background prediction is seen at large Nj. The inte- 
grated luminosity of the data sample in this Figure is 
1 fb~\ and a jet threshold of 50 GeV is used. Square 
markers show Nj distributions without a requirement on 
missing energy. The effect of a missing energy require- 
ment is discussed in section IVT1 



V. ROBUSTNESS 

The main goal of our method is robustness against im- 
perfections of the SM background modeling and detector 
simulation. By design, uncertainties in the background 
cross-section are accounted by normalizing to the yield 
in the forward region. In addition, any systematic ef- 
fect present in data should be taken into account by the 
background estimate, as long as the biases in Rn, ratios 
associated with the effect are a linear or slowly varying 



function of Nj. 

To examine the robustness of our method, we present 
a few illustrative tests. In each test, a change to the 
mock data samples is made and the analysis procedure 
is repeated. The results are presented in the form of 
pull distributions in Figure [51 where only statistical un- 
certainties are used to normalize the differences between 
observed and estimated numbers of events. 

The composition of the SM Z+jets sample, or other 
samples with a large number of jets, could differ from 
the ALPGEN predictions. To test the effect of such 
mis-modeling, we separate the Z+jets sample into two 
subsamples with an even {0,2,4} and odd {1,3,5} num- 
ber of ALPGEN partons and apply the analysis proce- 
dure to these subsamples. This is a particularly stringent 
test as it introduces drastic bin-to-bin variations in the 
Nj distributions. However, we find that the background 
is estimated accurately in most bins [Figure [5] (top, bin 
range from to 19)]. There are two bins, in ly+jets and 
7+jets, where the observed and estimated yields differ by 
about 3 standard deviations. These biases are attributed 
to changes in Rn , associated with the migration of events 
from higher to lower Nj bins. An event with n jets re- 
constructed in the (n— 1) Nj bin has a higher probability 
to be a forward event, as forward jets are lost more often 
and the tag rapidity is correlated, although weakly, with 
the rapidity of the jet system recoiling against the tag. 

Efficiencies for forward and central leptons are differ- 
ent. One might account for these differences by applying 
efficiency corrections measured from data, but these cor- 
rections will have significant uncertainties in early data 
taking. To test the robustness of the method against mis- 
modeling of lepton reconstruction efficiencies, we change 
forward or central efficiencies by 30%. We find that the 
background estimate remains accurate [Figure[6](top, bin 
range from 20 to 39)] [3. 

Similarly, lepton fakes introduce background in the 
if+jets and VF+jets channels, and photon fakes in the 
7+jets channel. Because the lepton and photon fake 
rates are expected to be a slowly varying function of Nj, 
background from such fakes should be accounted for ac- 
curately in our method. When we add a small fraction of 
multi-jet events to the mock data samples, they do not 
significantly bias the prediction. 

Significant uncertainties in the jet reconstruction ef- 
ficiencies are expected during early data taking. To 
test the robustness of the method against such ineffi- 
ciencies, jets are removed randomly with 30% probabil- 
ity. We find that the background estimate remains ac- 
curate [Figure [5] (top, bin range from 40 to 59)]. More 
demanding tests related to jet reconstruction efficiency 
and jet energy mis-measurements are presented below in 
section IVII 

We have confirmed that effects associated with uncer- 
tainties in the parton distribution functions are accom- 
modated by our method and do not bias the background 
prediction. The algorithm was also found to be robust in 
other tests not discussed here. 
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FIG. 4: The Nj distributions for Z+jets (a), VK+jets (b), 7+jets (c) and pure multi-jets (d). The backgrounds in the central 
regions are shown in black markers, its estimate is in shaded markers of the same shape displaced horizontally for visibility. 
Bottom plot: pull distributions for Z+jets (black squares), W+jets (black circles), 7+jets (black triangle-up) and pure multi- 
jets (black triangle-down). Here, Nj is offset by 10 between samples for visibility, i.e., Nj = Test Bin mod 10. Shaded markers 
in the bottom plot show how the pulls change after an addition of a 1.0% relative systematic uncertainty in each Nj bin. 
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FIG. 5: Nj distributions for Z+jets (black markers) and a 
mixture of Z+jets and events from LM4 mSUGRA bench- 
mark (shaded markers: estimated central SM background, 
open markers: all central events). This comparison is made 
with a 50 GeV jet energy threshold and a sample size corre- 
sponding to 1 fb" 1 . The effect of a ££ liss > 50 GeV require- 
ment for a sample with the jet energy mis-modeling discussed 
in section IVTl is shown by the circles. 



VI. 



In the results presented above, no requirement is made 
on missing transverse energy, E™ lss . Requiring large 
E™ 1SS could significantly suppress SM backgrounds, and 
it is expected to be efficient in a large class of new physics 
models, e.g., i?-parity conserving SUSY sear ches [n|, Hi- 
lt is challenging to rely solely on _E™ ISS in analyses of early 
data, because E™ 1SS is particularly difficult to model. 
However, it could be useful as an additional discriminator 
against SM backgrounds in the context of our algorithm. 

Unphysical sources of E™ lss include those associated 
with jet energy fluctuations, noise and inefficient regions 
of the calorimeters, which could all be larger in the for- 
ward region. Our method is expected to work well with 
a i?™ ISS requirement, nonetheless. The rapidity of the 
tag is only weakly correlated with the rapidity of the jet 
system recoiling against the tag due to the boost along 
the beam line in the laboratory frame. As a result, the 
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FIG. 6: Pulls between observed and estimated num- 
bers of events for Z+jets (squares), W+jets (circles), 
7+jets (triangle-up) and pure multi-jets (triangle-down) from 
robustness tests in section [V] (top) and from tests with a re- 
quirement on _E™ 1SS in section I VII (bottom). Top: Nj test 
bins in ranges [0; 19], [20; 39] and [40; 59] correspond to tests 
without a requirement on _E™ 1SS consisting in changing the 
composition of the ALPGEN sample ({0,2,4} and {1,3,5} par- 
tons), lepton/photon efficiencies (over the entire rj range and 
in the forward region) and jet efficiencies (over the entire rj 
range and in the forward region), respectively. Bottom: Nj 
test bins in ranges [0;19], [20; 39] and [40; 59] are from tests 
with a _B™ 1SS or Mt requirement, for different composition of 
the ALPGEN sample ({0,2,4} and {1,3,5} partons), hypothet- 
ical holes (over the entire rj range and in the forward region) 
and fluctuations in jet energies (over the entire rj range and 
in the forward region), respectively. In each test pulls in the 
two highest Nj bins are plotted. (Note, pulls in these tests 
are correlated as tests are made using events drawn from the 
same mock data samples.) 



E™ lss in the tag recoil system tends to be averaged over 
the entire rapidity coverage. Remaining effects can be 
accounted by low Nj bin fits to Rnj ■ 

We have made a set of robustness tests with a require- 
ment on i?™ lss by introducing mis-measurements and 
evaluating the consistency of the method's predictions. 
We require £™ iss > 50 GeV 15J for Z+jets, 7+jets and 
multi-jets. In W+jets, the undetected neutrino is a 
source of genuine E™ lss , and requiring E T 11SS > 50 GeV 
would have little effect. Instead, we impose a requirement 
on the transverse mass, Mt, which is constructed from 
i?™ lss and the lepton's transverse momentum. Requiring 
Mt > Mw+x GeV, where Mw is the W mass, is approx- 
imately equivalent in suppressing SM W+jets to requir- 
ing E™ 1SS > x GeV for SM Z+jets. For robustness tests in 
the W+jets sample, we require Mt > Mw + 50 GeV. In 



all four channels, the angle between the highest |pV| jet 
and the missing transverse momentum is required to be 
larger than 0.15. 

We repeat tests related to the ALPGEN composition 
of the mock data samples with a requirement on E™ lss . 
To emulate the effect of holes in the detector cover- 
age, we completely remove jets that fall within a cone 
of AR = \J Ar/ 2 + A</> 2 < 0.7 around three points in 
the detector, at rj = and rj = ±2, each at <f> = 0. 
The energy of each jet is varied according to the hypo- 
thetical probability density function shown in Figure [5] 
which includes wide non-Gaussian tails. Pulls between 
the observed and estimated numbers of events in high 
Nj bins from these tests are shown in Figure [HI (bottom). 
Good consistency between estimated and observed yields 
is seen. In these tests, the predictions are made based on 
only two Nj bins: 2 < Nj < 3 for Z+jets and W+jets, 
and 3 < Nj < 4 for 7+jets and multi-jets. We find 
that Rn., values in Nj = 1 for Z+jets and W+jets, and 
Nj = 2 for 7+jets and multi-jets tend to decrease after an 
additional requirement on missing energy for the reason 
already discussed in section [V] These bins are excluded 
from the background prediction procedure. Events recon- 
structed in higher Nj bins are less sensitive to this effect 
since the correlation between i?™ lss and tag rapidities is 
weaker in events with multiple jets. 

The effect of a E T 11SS > 50 GeV requirement on a search 
in the Z+jets sample with the jet energy mis-modeling 
over the entire rapidity coverage is shown in Figure [5] in 
round markers. The i?™ ISS requirement suppresses the 
SM Z+jets rate, but the suppression is a function of Nj. 
Nonetheless, our method continues to predict the back- 
ground accurately, and a signal excess is clearly apparent 
above the background prediction. 



VII. SM tt 

A search in the TV+jets sample is complicated by the 
top quark. The tt process, with one of the top quarks de- 
caying semileptonically and the other hadronically, pro- 
duces the same signature as that of W+jets. Due to the 
large top quark mass, the W bosons from top decays tend 
to be produced at small rapidities, and they increase Rnj 
ratios over that of W+jets. 

Figure [7] shows results of the analysis procedure ap- 
plied to a sample of W+jets and tt events, where the 
fit to the Rmj distribution is made in 1 < Nj < 2. The 
central yield is higher than the background prediction be- 
cause of the top contribution; the pull distribution in the 
right column shows the significance of the tt excess. This 
demonstrates that the method works in revealing decays 
of massive particles, and it could be used to measure the 
tt cross-section. However, tt complicates the search for 
other massive particles. 

One approach to searching beyond tt would be to sub- 
tract the tt contribution, either using a prediction for 
its cross-section, or an independent measurement. An- 
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FIG. 7: Results of the analysis procedure applied to the combined W+jets and ti sample for selection criteria defined in 
section Hill Left: Nj distributions for the combined W+jets and ti sample, Right: pull distributions for the plots in the left 
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FIG. 8: Nj distributions for W+jets (black markers), 
W+jets and ti (shaded markers) and a mixture of W+jets, 
ti and events for LM6 mSUGRA benchmark (open markers) . 
Selection criteria on Mt are given in the legend. 



other approach is to include the ti background in the 
fit. At high Nj, shifts in Rmj caused by ti are a slowly 
varying function of Nj, so that the method should ac- 
commodate the combined W+jets and ti contribution 
in the background prediction. 

Low mass mSUGRA models are challenging for 
searches in Nj as they produce Nj distributions peaking 
in the region where the ti contribution is maximal. Fig- 
ure [5] illustrates this by comparing the central yield and 
prediction with and without a signal contribution. The 
LM6 mSUGRA benchmark is used and the comparison 
is made for a sample size corresponding to 1 fb -1 . A 
jet threshold of 50 GeV is used, and a transverse mass 
requirement of My > Myy + 150 GeV is applied to sup- 
press SM backgrounds. There is a large signal contri- 
bution at Nj > 4, but it is not easily discernible above 
the central prediction made using 2 < Nj < 3. The 
prediction is biased due to the residual ti contribution 
bridging between the W+jets dominated low Nj region 
and the signal dominated high Nj region. The ti and sig- 
nal contributions together are large enough to bias the 
prediction. We discuss an alternative approach in the 
next section. 



VIII. SEARCH FOR NEW PHYSICS IN R Nj 

In the preceding discussion, we used fits to Rnj to ob- 
tain a background prediction for the high Nj distribution 
in central events and searched for excess signal there. Al- 
ternatively, we can search for new physics solely in the 
R 7v j distributions. The Rn,j ratios for heavy new parti- 
cles are larger than that for SM processes, and a search 
for enhancements in the high Nj bins could reveal new 
phenomena or provide generic bounds on it. 

Figure O shows the Rn, t distributions for a number of 
LHC processes. A distribution for minimum bias, i.e., 
low \pt\ scattering, events is shown for illustration pur- 
poses, where instead of jets, tracks with \px\ above 3 GeV 
are used with the highest \pr | track providing the rapidity 
tag. Distributions for SM processes studied in this paper, 
Z+jets, W+jets, 7+jets and QCD jets, appear approx- 
imately in the middle of the available Rn.j range not far 
from that of the minimum bias events. The ti process 
contributes at higher Rn.j, due to the large top quark 
mass. Distributions for LM4 and LM6 mSUGRA bench- 
marks in the Z+jets and lepton+jets+-E™ ISS channels 
appear at higher i?jv / of about 0.8. 

The Z+jets channel has little background, so identifi- 
cation of a new physics signal within it could be unam- 
biguous. This is illustrated in Figure [TO] (a), where the 
Rn., distributions for SM Z+jets, with and without a 
new physics contribution (LM4 mSUGRA benchmark), 
are presented. The same threshold on jet |pV| of 50 GeV 
as in Figure [5] is used. Black markers show the SM 
.Z+jets Rnj distribution. It is reproduced accurately 
in a sample with LM4 by requiring E™ lss < 50 GeV 
as shown in shaded markers. Alternatively, the SM 
Z+jets Rm,j shape in the sample with LM4 can be ob- 
tained based on 1 < Nj < 3, where the relative contri- 
bution from LM4 is negligible. The new physics signal 
stands out clearly at Nj > 5 without any requirements 
on Ef iss . 

The W+jets channel is complicated by the ti contri- 
bution, as discussed in section IVIII Figure [TO] (b) shows 
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FIG. 9: Rnj distributions for minimum bias events (track 
based, see the text), ti (crosses), Z+jets (squares), 
W+jets (circles), 7+jets (triangles-up), QCD jets (triangles- 
down) and new physics signals (stars). 
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FIG. 10: Plot (a): R Nj distributions for SM Z+jets (black 
markers) and a mixture of Z+jets and events for LM4 
mSUGRA benchmark (shaded markers: estimated central 
SM background, open markers: all central events). Plot (b): 
Rnj distributions for W+jets (black markers), W+jets and 
ti (shaded markers) and a mixture of W+jets, ti and events 
for LM6 mSUGRA benchmark (open markers) . In both plots, 
a jet threshold of 50 GeV is used; selection criteria on _B™ 1SS or 
Mt are given in the legend. 



the Rn,j distribution for a combined W+jets and ti sam- 
ple, without (black) and with (shaded and open) an LM6 
mSUGRA signal. As in Figure a jet \pr\ threshold of 
50 GeV is used and Mt is required to be greater than 
Mw + 150 GeV to suppress SM backgrounds. The in- 
tegrated luminosity of the data sample is 1 fb -1 . Sim- 
ilarly to the search in Z+jets, the SM reach in Rm, at 
high Nj can be constrained by using the sample with 
LM6 and requiring Mt < 50 GeV as shown in shaded 
markers. There is a large signal excess at Nj > 4, 
but the discriminating power of the search in Rnj in 
the lepton+jets+-E™ lss signature for low mass mSUGRA 
models is limited by the residual ti contribution. The 
identification of new physics in Rn , producing larger 
number of jets compared to low mass mSUGRA mod- 
els could be possible. 

The search in Rn , is based on the distribution of tags 
in (pseudo-) rapidity in events from the same Nj bin. One 
can include additional information in the search from 
event yields in neighboring bins. At sufficiently high Nj 
additional jets are produced via higher order QCD pro- 
cesses so that the Nj distributions fall steeply in that 
regime. Selection criteria imposed on object \pt\ thresh- 
olds and i?™ lss can significantly modify the Nj spectra. 
However, a very general expectation is that the SM Nj 
yields fall approximately exponentially at high Nj, while 
new physics can modify it. We can use that expectation 
without relying heavily on the shape of the Nj spectrum. 

To that end, we consider another observable R^^ = 



Forward 



T^Ccntral /fT^Forw 
Nj /\ i Nj-1 



y Central^ where y Nj j g thc eyent 

yield in the Nj bin. It is identical to Rn t but in the 
denominator the forward yield in the Nj — 1 bin is used. 



Similarly, one can define R 



(-2) 
Nj ' 



where the denomina- 



tor includes the forward yield in the Nj — 2 bin. Fig- 



ures [TT] and [12] show R^j~^ and R\j^' for the Z+jets and 
H^+jets samples using the previously described selec- 
tion. The signal excess is clear and enhanced in the 
Z+jets sample. For the W+jets sample, the signal shape 
also has better separation from the background shape 
than in Figure [lOl These variables are less robust than 
R Nj , but they have higher discriminating power against 
the background. 



?(-2) 



Using quantities like Rnj, or Rn?' cou ld allow 

direct comparison across several signatures, those consid- 
ered in this paper as well as others, such as, same-sign 
or opposite-sign di-leptons, jets and E™ lss . As such, they 
could be used to quickly perform a comprehensive search 
for new physics across multiple signatures in a few simple 
distributions. 



? (-2) 



IX. 



SYSTEMATIC UNCERTAINTIES 



The background estimation method discussed in this 
paper is not subject to the theoretical and experimental 
systematic uncertainties usually associated with MC sim- 
ulation, since the background shapes and normalization 
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FIG. 11: Plot (a): R ( N ^ distributions for Z+jets (black 
markers) and a mixture of Z+jets and events for LM4 
mSUGRA benchmark (open markers). Plot (b): R^ 1 ^ distri- 
butions for VF+jets and a mixture of W+jets, ti and events 
for LM6 mSUGRA benchmark. In both plots, a jet threshold 
of 50 GeV is used; selection criteria on 17™ lss or Mr are given 
in the legend. 
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FIG. 12: Plot (a). ,, A 
markers) and a mixture 

mSUGRA benchmark (open markers) . Plot (b) 
butions for W^+jets and a mixture of VF+jets, ti and events 
for LM6 mSUGRA benchmark. In both plots, a jet threshold 
of 50 GeV is used; selection criteria on i?™ lss or Mr are given 
in the legend. 



are measured from data. Instead, systematic uncertain- 
ties come from the statistical precision for extrapolating 
event yields from large to small rapidity and from uncer- 
tainties in the validity of a linear extrapolation in Rn.,- 
There are several sources for an extrapolation bias. 

SM processes in which jets are produced via a mecha- 
nism other than initial or final state radiation could bias 
the background prediction. The effect of ti discussed 
above is an extreme example. Di-boson production is 
another, e.g., WZ with a hadronic W boson decay peaks 
at Nj rts 2 in the Z+jets channel. The cross-sections 
for di-boson processes can be measured, but even if not, 
they are sufficiently small so that their contributions are 
negligible. 

A linear extrapolation in R^j is valid only approxi- 
mately. Large correlations between Nj and the rapidity 
dependence of the tag can lead to a bias. For example, for 
Nj = 1 in the 7+jets sample, the \pr\ of the 7 used for 
the rapidity tag is directly correlated with the \pr | of the 
recoiling jet. The effect of correlations can be measured 
by varying the threshold and identification requirements 
for jets, leptons, photons and E™ lss . Lowering thresholds 
will suppress sensitivity to massive new particles and re- 
sult in a wider Nj range that is background dominated. 
Such background samples could be used for systematic 
studies such as comparison of alternative, i.e., non-linear 
parametcrizations and different Nj fit ranges. Varying 
the 77 ranges used to define forward and central events 
would have similar utility. 

The usage of different, in-situ control samples is im- 
portant to optimize and validate the final algorithm with 
data, and quantify its systematic biases. We expect 
that dominant systematic uncertainties will be associated 
with statistical uncertainties in such control samples. 



X. CONCLUSION 

We have presented a new method to predict SM back- 
grounds within the context of a search for new phenom- 
ena in final states with multiple jets: Z+jets, VK+jets, 
7+jets and multi-jets. The fraction of central events, 
measured in events with few jets, is used to extrapolate 
the backgrounds measured in the forward region into the 
central region for events with many jets. This fraction 
of central events is identified as a new discriminator be- 
tween SM and heavy new particles and it could be useful 
in any new physics search at LHC. 

The method performs well in robustness tests with- 
out and with a requirement on the presence of significant 
missing transverse energy. We have discussed systematic 
uncertainties associated with the method and procedures 
to estimate them. The usage of a ratio cancels many ex- 
perimental uncertainties, and the data-driven procedure 
avoids theoretical uncertainties. This analysis could be 
performed without recourse to MC in early LHC data, 
when robustness against imperfections of background 
modeling and detector simulation can be a key to the 
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discovery of new phenomena. 
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